7 research outputs found
Recommended from our members
Human Shape from Silhouettes Using Generative HKS Descriptors and Cross-Modal Neural Networks
In this work, we present a novel method for capturing
human body shape from a single scaled silhouette. We combine deep correlated features capturing different 2D views,
and embedding spaces based on 3D cues in a novel convolutional neural network (CNN) based architecture. We
first train a CNN to find a richer body shape representation space from pose invariant 3D human shape descriptors.
Then, we learn a mapping from silhouettes to this representation space, with the help of a novel architecture that exploits the correlation of multi-view data during training time, to
improve prediction at test time. We extensively validate our
results on synthetic and real data, demonstrating significant
improvements in accuracy as compared to the state-of-the-art, and providing a practical system for detailed human
body measurements from a single image
Recovery of the 3D Virtual Human: Monocular Estimation of 3D Shape and Pose with Data Driven Priors
The virtual world is increasingly merging with the real one. Consequently a
proper human representation in the virtual world is becoming more important
as well. Despite recent technological advances in making the virtual human presence
more realistic, we are still far from having a fully immersive experience in
the virtual world, in part due to the lack of proper capturing and modeling of
a virtual double. Thus, new methods and techniques are needed to obtain and
recover a realistic virtual doppelg¨anger. This thesis aims to make virtual human
representation accessible for every person, by showcasing how it can be obtained
under inexpensive minimalistic sensor requirements. Potential fields of application
of the findings could be the estimation of body shape from selfies, health
monitoring and garment fitting.
In this thesis we investigate the problem of reconstructing the 3D virtual human
from monocular imagery, mainly coming from an RGB sensor. Instead of focusing
on the full avatar at once, we separately consider three constituting parts of it: the
naked body, clothing and the human hand. The preeminent focus is on the estimation
of the 3D shape and pose from 2D images, e.g. taken from a smart-phone,
making use of data-driven priors in order to alleviate this ill-posed problem. We
utilize discriminative methods, with a focus on CNNs, and leverage existing and
new realistically rendered synthetic datasets to learn important statistics. In this
way, our presented data-driven methods can generalize well and provide accurate
reconstructions on unseen real input data. Our research is not only based
on single views and annotated groundtruth data for supervised learning, but also
shows how to utilize multiple views simultaneously, or leverage from them during
training time, in order to boost performance achieved from a single view at
inference time. In addition, we demonstrate how to train and refine unsupervised
with unlabeled real data, by integrating lightweight differentiable renderers into
CNNs.
In the first part of the thesis, we aim to estimate the intrinsic body shape, regardless
of the adopted pose. Under assumptions of uniform background colours and
poses under minimal self-occlusion, we show three different approaches for estimating
the body shape: Firstly, by basing our estimation on handcrafted features
in combination with CCA and random forest regressors, secondly by basing it on
simple standard CNNs, and thirdly by basing it on more involved CNNs with generative and cross-modal components. We show robustness to pose changes,
silhouette noise and state-of-the-art performance on existing datasets, outperforming
also optimization based approaches.
The second part of the thesis tackles the estimation of garment shape from one
or two images. Two possible estimations of the garment shape are provided: one
that gets deformed from a template garment (i.e. from a t-shirt or a dress) and
second one that gets deformed from the underlying body. Our analysis includes
empirical evidence which shows the advantages and disadvantages of utilizing
either of the estimation methods. We adopt lightweight CNNs in combination
with a new realistically rendered garment dataset, synthesized under physically
correct dynamic assumptions, in order to tackle the very difficult problem of estimating
3D shape from an image. Training purely on synthetic data, we are the
first to show that garment shape estimation from real images is possible through
CNNs.
The last and concluding part of the thesis focuses on the problem of inferring
a 3D hand pose from an RGB or depth image. To this end, our proposal is an
end-to-end CNN system that leverages data from our newly proposed realistically
rendered hand dataset, consisting of 3 million samples of hands in various
poses, orientations, textures and illuminations. Utilizing this dataset in a supervised
training setting, helped us not only with pose inference tasks, but also with
hand segmentation. We additionally introduce network components based on
differentiable renderers that enabled us to train and refine our networks with unlabeled
real images in an unsupervised fashion, showing clear improvements. We
demonstrate on-par and improved performance over state-of-the-art methods for
two input modalities, under various tasks varying from 3D pose estimation to
gesture recognition
How to Refine 3D Hand Pose Estimation from Unlabelled Depth Data ?
Data-driven approaches for hand pose estimation from depth images usually require a substantial amount of labelled training data which is quite hard to obtain. In this work, we show how a simple convolutional neural network, pre-trained only on synthetic depth images generated from a single 3D hand model, can be trained to adapt to unlabelled depth images from a real user’s hand. We validate our method on two existing and a new dataset that we capture, both quantitatively and qualitatively, demonstrating that we strongly compare to state-of-the-art methods. Additionally, this method can be seen as an extension to existing methods trained on limited datasets, which helps on boosting their performance on new ones
Monocular RGB Hand Pose Inference from Unsupervised Refinable Nest
CNN-based approaches are typically data-hungry, and when the task to solve is monocular RGB hand pose inference, obtaining real labelled training data is very hard to obtain. To overcome this, in this work we propose a new, large, realistically rendered, available hand dataset and a neural network trained on it, with the ability to refine itself to real unlabeled RGB images, given unlabeled corresponding depth images. We benchmark and validate our method on available and captured datasets, demonstrating that we strongly compare and even outperform state-of-the-art methods on tasks varying from 3D pose estimation to hand gesture recognition
Assessment of Patient Satisfaction Using a New Augmented Reality Simulation Software for Breast Augmentation: A Prospective Study
Background: Breast augmentation is one of the most frequently performed plastic surgery procedures. Providing patients with realistic 3D simulations of breast augmentation outcomes is becoming increasingly common. Until recently, such programs were expensive and required significant equipment, training, and office space. New simple user-friendly programs have been developed, but to date there remains a paucity of objective evidence comparing these 3D simulations with post-operative outcomes. The aim of this study is to assess the aesthetic similarity between a pre-operative 3D simulation generated using Arbrea breast simulation software and real post-operative outcomes, with a focus on patient satisfaction. Methods: The authors conducted a prospective study of patients requiring breast augmentation. Patients were asked to assess how realistic the simulation was compared to the one-year post-operative result using the authors’ grading scale for breast augmentation simulation assessment. Patient satisfaction with the simulations was assessed using a satisfaction visual analogue scale (VAS) ranging from 0 (not at all satisfied) to 10 (very satisfied). Patient satisfaction with the surgical outcome was assessed using the BREAST-Q Augmentation Module. Results: All patients were satisfied with the simulations and with the attained breast volume, with a mean VAS score of 8.2 ± 1.2. The mean simulation time took 90 s on average. The differences between the pre-operative and one-year post-operative values of the three BREAST-Q assessments were found to be statistically significant (p Conclusions: Three-dimensional simulation is becoming increasingly common in pre-operative planning for breast augmentation. The present study aimed to assess the degree of similarity of three-dimensional simulations generated using Arbrea Breast Software and found that the use of the software provided a very satisfying representation for patients undergoing breast augmentation. However, we recommend informing patients that only the volume simulation is extremely accurate. On the other hand, it is necessary to not guarantee an absolute correspondence regarding the breast shape between the simulation and the post-operative result